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Abstract. The original frequentist approach for computing confidence intervals involves the construction of the con¬ 
fidence belt which provides a mapping of the observation in data into a subset of values for the parameter. There are 
different prescriptions for constructing the confidence belt, here we use the one provided by Feldman and Cousins. 
Alternative methods based on the frequentist idea exist, including the delta likelihood method, the CLj method and a 
method here referred to as the p-value method, which have all been commonly used in high energy experiments. The 
purpose of this article is to draw attention to a series of potential problems when applying these alternative methods 
to the important case where the predicted signal depends quadratically on the parameter of interest, a situation which 
is common in high energy physics as it covers scenarios encountered in effective theories. These include anomalous 
Higgs couplings and anomalous trilinear and quartic gauge couplings. It is found that the alternative methods, contrary 
to the original method using the confidence belt, encode the goodness-of-fit into the confidence intervals and potentially 
over-constrain the parameter. 

PACS. XX.XX.XX No PACS code given 


1 Introduction 

The phenomenological description of Beyond the Standard 
Model (BSM) physics in model independent searches is typ¬ 
ically done in the framework of effective Lagrangians. The ba¬ 
sic assumption is that there exists new physics with degrees 
of freedom so heavy that they cannot be produced directly at 
present colliders such as the Large Hadron Collider (LHC). 
The only observable effect is the modification of existing in¬ 
teractions or the introduction of new interactions between the 
Standard Model (SM) particles. These interactions are intro¬ 
duced by adding new terms with associated couplings to the 
SM Lagrangian; examples include anomalous Higgs couplings 
m and anomalous trilinear ^ and quartic lO gauge couplings. 
The new terms in the Lagrangian are typically non-renormalis- 
able which makes the differential cross section increase as func¬ 
tion of energy and eventually violate S-matrix unitarity mm. 

Since the new couplings enter linearly in the Lagrangian, 
the differential cross section depends quadratically on the cou¬ 
plings through the amplitude squared. The parabolic behaviour 
of the differential cross section implies the existence of a lower 
bound on the predicted signal. For the cases studied at the LHC, 
such as anomalous Higgs couplings and anomalous trilinear 
and quartic gauge couplings, this bound is typically located 
close to or at the SM expectation. Consequently, experimen¬ 
tal outcomes which show distinct downward fluctuations with 
respect to the SM expectation are not described by the model. 

The inadequacy of the model to describe all experimen¬ 
tal outcomes does not indicate that the model is wrong, but 
rather that it is sensitive to statistical fluctuations in a finite 


data sample. It should also be emphasised that the parameter 
of the model is unbound in both a physical and a mathematical 
sense. The problem is therefore not concerned with a parameter 
boundary. 

The study presented here is related to previous work in the 
literature considering problems with quadratic parameter de¬ 
pendence, where the parameter is bound, e.g. the well-known 
problem of measuring neutrino masses ||71. However, in our 
case, the parameter dependence is more complex due to the 
presence of a linear term. For this reason, the standard ap¬ 
proach of restating the bound on the differential cross section 
as a bound on the parameter squared cannot be employed. 

We review a number of statistical methods currently used 
for estimating couplings in effective field theories highlighting 
differences in the resulting confidence intervals. A comprehen¬ 
sive treatment of confidence intervals is given in il, including 
the case of couplings in effective field theories El where the 
approach by Feldman and Cousins I?] is presented, albeit, in a 
form which does not ensure proper confidence intervals. 

This article is organised as follows: Section describes 
the theoretical bound on the predicted signal coming from the 
quadratic parameter dependence for BSM contributions in ef¬ 
fective theories. Section presents the most commonly used 
frequentist methods for determining confidence intervals. Sec- 
tion|4]introduces a set of distributions called the Baur set which 
systematically probes different regions in the observable in¬ 
cluding those not described by the model due to the bound. 
The Baur set is used in section]^ for comparing the statistical 
methods for the special case where the interference between 
the SM and BSM terms is zero. In section]^ results are shown 
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for the general case with non-zero interference. Section|7]gives 
the conclusion. 


2 Theoretical bounds on the predicted signal 

In effective theories where the SM Lagrangian is extended with 
an extra interaction term and a corresponding coupling strength 
parameter, 6, the differential cross section, dcj/ck, for a given 
observable, x, depends quadratically on the parameter through 
the amplitude squared, 

— (0) |Asm(-*)+^bsmW • (1) 

where AsmW and Absm(-*) denote the SM and BSM complex 
amplitudes, respectively, and the dependence on 0 has been 
factored out from the BSM amplitude. 

More explicitly, this means that the differential cross sec¬ 
tion can be written on the quadratic form 

^(0) = (3o(x) +ai(x) ■ 0 +a 2 (x) ■ 0^, (2) 

dx 

where a,(x) are real numbers depending on x and integrated 
over all remaining phase space dependencies. 

The first term, ao(jr), denotes the point of expansion which 
is equivalent to the SM expectation. The coefficient in the lin¬ 
ear term, ai{x), represents the interference between the SM 
and the BSM terms in the Lagrangian. The coefficient in the 
quadratic term, aiix), solely contains the contribution from the 
BSM term in the Lagrangian. 

The parabolic behaviour in equation [^implies a bound on 
the differential cross section, the observable effects of which 
depend on the signs and the relative sizes of ao(x), ai(x) and 
a2ix). 

The sign of 02 (■^) determines whether the bound is a max¬ 
imum or a minimum. In effective field theories, the non-renor- 
malisability of the BSM contribution usually renders a 2 {x) pos¬ 
itive such that the bound introduced is a lower bound. For the 
following discussion, we assume that this is the case, but also 
note that an upper bound would give rise to the same conclu¬ 
sions. 

If ai{x) is relatively larg^compared to a 2 (x), the extremum 
in da/dx is shifted significantly away from 9=0 and the signal 
prediction behaves pseudo linearly for small |0|. In this case, 
the model is able to describe experimental outcomes with event 
yields below the SM expectation, which means that the bound 
on the differential cross section has a small effect as long as 
the observation is not too far from the SM expectation relative 
to the sensitive. However, the linear term is often very smalj^ 
compared to the quadratic term and hence the bound is close to 
0=0. Consequently, even relatively small fluctuations away 
from the SM expectation can result in an observation which is 
not described by the model. A scenario of this kind is the main 
focus of this article. 

* The allowed range of a\(x) will be discussed in sectionj^ 

^ In fact, the linear term is completely absent if the BSM terms in 
the Lagrangian are CP violating. 


The quadratic parameter dependence can be generalised to 
an expansion of any power larger than or equal to two, corre¬ 
sponding to a higher order operator expansion, 

dc7, , A , , 

— {0) = J^ai{x)-0‘ , n>2. (3) 

^ ;=0 

The exact number of non-zero coefficients ai(x) is not impor¬ 
tant to the arguments presented here as long as the highest 
power in the expansion is even. If the highest power is odd, 
many of the considerations presented here are still important 
depending on the specific physics model. In fact, since the im¬ 
portant feature is that the model is unable to describe all pos¬ 
sible experimental outcomes, the results presented here are not 
limited to a power law expansion, but are relevant for any func¬ 
tion of 0 for which this is the case. 


3 Determination of confidence intervals 

In high energy physics, the most commonly used methods for 
computing confidence intervals are the confidence belt, the delta 
likelihood method, the CL* method, and a method here referred 
to as the p-value method. This section gives brief descriptions 
of these methods with emphasis on the specific properties which 
are special for scenarios where the signal prediction depends 
quadratically on the parameter of interest. 

First, it is useful to recall the definition of a confidence in¬ 
terval. A confidence interval is an interval estimate of a model 
parameter which contains the unknown true value of the pa¬ 
rameter with a probability given by the confidence level. This 
means that in the limit where the experiment is repeated an 
infinite number of times and the confidence interval is recom¬ 
puted every time, the probability that any of these confidence 
intervals contains the true value of the parameter is equal to the 
confidence level. 

This leads to the important concept of coverage probability. 
Coverage probability is the proportion of the time that the con¬ 
fidence interval contains the true value of the parameter. Thus, 
it can be regarded as the actual confidence level of the com¬ 
puted interval. Ideally, the coverage probability is equal to the 
confidence level. If the coverage probability is smaller than the 
confidence level, the confidence interval is termed permissive, 
while it is termed conservative if the coverage probability is 
greater than the confidence level. 

For illustrative purposes, a binned observable x is intro¬ 
duced using 10 bins in the range [0; 1]. The number of mea¬ 
surements, also referred to as the number of events, in each bin 
of the observable is governed by Poisson statistics. The likeli¬ 
hood function is defined as the product of the probabilities for 
the individual Poisson processes, i.e. 

if(0)=n^^e-^'(®), (4) 

i=i 

where the expected number of events in the bin is given by 
/r,(0) which depends quadratically on 0, i.e. 

jtti(0) = O 0 ,i d-aij ■ 0 ~\-a 2 ,i • 0^. 


(5) 
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X 

Fig. 1: Distributions of the observable, x, for 3 values of the pa¬ 
rameter: 9—0 (black), 9 — 0.03 (blue) and 9 = —0.02 (red). 
The signal parameterisation has ai =0 for all bins in the ob¬ 
servable. For convenience, the values of aq and a 2 are given 
below the graphs for each bin, respectively. 

The quantities n, in equation|^correspond to the number of 
observed events in the bin of the observable and the set {«;} 
is referred to as the observation. For pseudo data, {«,} are inte¬ 
gers, but when using the predicted signal for a given value of 9 
as the event count, e.g. when estimating the confidence interval 
for the SM expectation, {«, } are treated as real number^ 

It should be noted that in equation we have adopted the 
standard approach where the likelihood is considered a func¬ 
tion of the parameter, hence suppressing the dependence on 
{«,} in the notation, i.e. 

if(0)=^({n,}|0). (6) 

Initially, the interference term in the model is set to zero 
for all values of the observable, i.e. ai =0 for all bin^ The 
value of 02 is increasing across the interval in x such that the 
sensitivity of x to the parameter 0 grows monotonically with 
X Figure shows the SM expectation (black) and the dis¬ 
tributions for 0 = 0.03 (blue) and 0 = —0.02 (red) using this 
parameterisation. The values of aq and A 2 in each bin is given 
in the plot. The sizes of all data samples are large enough to 
avoid dealing with features related to low event counts in the 
individual bins of the observable. 

All fits are performed using the minimisation routine 
MINUIT ifTOl via its implementation in ROOT ifTTI and the func¬ 
tion which is minimised is twice the negative logarithm of the 

^ In this case, the factorial in the Poisson probability in equation]^ 
is substituted with the Gamma function. 

The effects of non-zero interference, ai(x) ^ 0, are addressed in 
sectionj^ 

^ This choice is arbitrary. Here we have chosen values that give a 
behaviour similar to what is encountered in LHC and Tevatron exper¬ 
iments. 



Fig. 2: The black curve shows —2\nq as function of 0 when 
the SM expectation is used as the observation. The intersections 
between the horizontal line at 3.84 and the curve gives the delta 
likelihood ratio interval at 95% CL as indicated by the vertical 
arrows. 


likelihood ratio, defined as 

-21n^(0) =-2[lnif(0)-ln.if(0)], (7) 

where q denotes the likelihood ratio and 0 is the maximum 
likelihood estimator for 0. 

The next sub-sections describe the four statistical methods 
and will as illustration use the SM expectation as the observa¬ 
tion. 


3.1 Delta likelihood method 

Traditionally, the delta likelihood method has been used for re¬ 
porting confidence intervals, e.g. m, 113. It is the simplest 
and fastest method for computing confidence intervals among 
the approaches described here, since it does not require large 
amounts of simulated data. 

The confidence interval is estimated by considering the vari¬ 
ation of the likelihood function near its maximum. It is given 
by the interval [0iow, 0high] for which 0 satisfies 

-21nq'(0) < -21nq'cL, (8) 

where —2Inq'cL is a constant computed from the chi-square 
distribution with one free parameter. For a confidence interval 
at 95% CL, this is given by —2Inq'95% = 3.84. 

Figurej^shows — 21n^ when the SM expectation is used as 
the observation. The dashed horizontal line indicates the 95% 
CL and the vertical arrows give the end-points of the corre¬ 
sponding confidence interval. 
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Fig. 3: The contour plot shows the Neyman construction at 95% 
CL with Feldman-Cousins ordering. The confidence interval is 
given by the intersections with the dashed horizontal line at 
0obs = 0 as indicated by the vertical aiTows. 

3.2 Confidence belt 

The original frequentist confidence interval [0iow, 0high] for the 
parameter 9 is computed by constructing the confidence belt. 
This is also called a Neyman construction as the general prin¬ 
ciple was hrst formulated by Jerzy Neyman in 1937 ina. 

The confidence belt consists of the conjunction of intervals 
[0iowj ^igh] which are determined for each value of 9 by inte¬ 
grating the probability density function P(0|0) such that 

r^igh .. .. 

/ P{9\9)d9 = a, (9) 

J ^ow 

where a denotes the conhdence level. 

The belt has the property that as long as equation|^is satis- 
hed for all 9, any orthogonal intersection with the confidence 
belt at a given 9 gives a set of intervals in 9 with a coverage 
probability a. Thus, the conhdence interval is determined by 
the orthogonal intersection at the value of the maximum likeli¬ 
hood estimator for the observation, 0obs- 

While this procedure ensures coverage by construction, it 
still allows the freedom to choose which elements to be inside 
the interval given by equation]^ The exact choice makes the in¬ 
terval unique and is known as the ordering principle. Feldman 
and Cousins developed an ordering principle which usually is 
referred to as the unified approach, likelihood ratio ordering 
or Feldman-Cousins ordering Cl. According to this principle, 
the interval is dehned by including elements of probability or¬ 
dered by their likelihood ratios such that higher ratios are given 
precedence over lower ratios for inclusion in the belt. 

The Feldman-Cousins ordering prescription is used here to 
to be able to make a direct comparison to the p-value method 
described in section |3.3| It should be noted that the original 


Fig. 4: The distribution of the 95% highest ranking pseudo ex¬ 
periments using Feldman-Cousins ordering which defines the 
conhdence belt shown in hgurej^ 

problems addressed by Feldman and Cousins are not present in 
our case. 

Figure 1^ shows the conhdence belt at 95% CL. It is con¬ 
structed numerically with simulated data in the form of pseudo 
experiments drawn from the expected distribution for a suitable 
range in 9. The distinct cross like shape of the conhdence belt 
rehects the quadratic dependence on 9 in the signal prediction 
which implies that 9 is mapped to both same sign and opposite 
sign 9. When the SM expectation is used as the observation, 
the conhdence interval is given by the intersections between 
the dashed horizontal line at 0obs = 0 and the conhdence belt 
as illustrated by the vertical arrows in hgure|^ 

As further illustration, hgure|^ shows the two-dimensional 
distribution of the 95% highest ranking pseudo experiments 
which dehne the conhdence belt. It is seen that man y p seudo 
experiments give 9—0. This is also seen in hgure |5 which 
shows three vertical projections of the two-dimensional distri¬ 
bution in hgure 1^ The projection for 9—0 (black histogram 
in hgure 1^ shows that roughly half of the pseudo experiments 
give 0=0. The reason for this sharp peak at 0 = 0 is that 
the lower bound on the predicted signal is located at the SM, 
0=0, when the linear term is absent. For pseudo experiments 
generated around 0=0, there is a high probability that a sig- 
nihcant part of the bins in the observable have a downward 
Huctuation in the event yield wrt. the SM expectation and since 
these bins suggest that the best fit is 0 = 0, the fit is pulled 
in this direction. Figure shows three horisontal projections 
of the two-dimensional distribution in figure It is seen that 
these projections show no signs of the peak structure visible in 
hgure 1^ since the peak structure is purely horisontal. The ho¬ 
risontal projection at 0 = 0 (black histogram in hgure]^ shows 
how the peak evolves as function of 0 and that it is a smoothly 
rising distribution and symmetric around 0 = 0 as should be 
expected. 
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Fig. 5: Distributions of the 95% highest ranking pseudo experi¬ 
ments for 0 = 0 (black), 0 = 0.0164 (blue) and 0 = 0.03 (red), 
corresponding to three vertical projections at these values of 0 
of the contour plot shown in figure 


Fig. 6: Distributions of the 95% highest ranking pseudo experi¬ 
ments for 0=0 (black), 0 = 0.0164 (blue) and 0 = 0.03 (red), 
corresponding to three horizontal projections at these values of 
0 of the contour plot shown in hgure 4 


As mentioned in section the Feldman-Cousins approach 
has been studied before in the context of couplings in effective 
held theories ii. However, the specihc implementation of the 
method is different from what is done here. In fact, in 13 two 
different but equivalent methods are employed for single- and 
multi-bin distributions, respectively. For the single-bin distri¬ 
bution, an observed event yield is used to derive a conhdence 
interval on the predicted event yield via the Feldman-Cousins 
prescription. This interval is then translated into an interval on 
the parameter by solving the quadratic equation describing the 
relation between the two. The problem with this approach is 
that the translation does not preserve probability, as the map¬ 
ping from the event yield to the parameter does not exist for all 
values of the event yield. In order to properly map between the 
observation in data and the true parameter, the observed event 
yield must hrst be stated in terms of the measured parameter be¬ 
fore mapping into a subset of values for the true parameter, as 
we have done here. For a multi-bin distribution, the implemen¬ 
tation in ill reverts to the equivalent p-value method, described 
in the following. 


3.3 p-value method 

The p-value method is an alternative frequentist approach to 
the confidence belt. Traditionally, p-values are used for hypoth¬ 
esis testing and do not depend on the parameter. However, at 
the LHC it has been used to report conhdence intervals in con¬ 
junction with parameter estimation, e.g. ca. 

The idea is to determine the conhdence interval by invert¬ 
ing a hypothesis test quantihed by a p-value. This approach is 
completely equivalent to the conhdence belt with likelihood ra¬ 
tio ordering when the signal prediction depends linearly on the 
parameter, which includes the important case of estimating the 


signal strength parameter in a resonance search. In this case, 
the conhdence belt corresponds to the acceptance region of the 
hypothesis tesj^ 

The p-value is dehned as 

p(0) = -2 r /(-2In^(0))dln^(0), (10) 

J-21n(irobs{0) 

where /(—21n^(0)) denotes the distribution of —llnq for a 
given 0. 

The conhdence interval is determined by the interval in 0 
for which the p-value is larger than I — a, where a indicates 
the conhdence level. 

The calculation of the p-value can be done numerically by 
performing pseudo experiments. In this case, it is given by the 
fraction of pseudo experiments for which the value of —2\\\q 
is larger than it is for the observation, i.e. 


p{Q) 


A^-21n^ioy(e)>-^21ng„b,(e) 
N total 


( 11 ) 


where 

-21n^,oy(0) = -2[lniftoy(0)-ln.i?ioy(0)], (12) 

and 

-21n^obs(0) = -2[lnifobs(0)-ln.i?^obs(0)]. (13) 


The likelihood functions for the observation and pseudo exper¬ 
iment are denoted .ifobs and iftoy, and A^_ 2 in 9 ,oy(e)>- 2 in?„b.( 0 ) 

® As will be demonstrated later, this relationship does not hold when 
the signal prediction depends quadratically on the parameter of inter¬ 
est. 
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Fig. 7: The solid black curve shows the p-value when the SM 
expectation is used as the observation. The confidence interval 
is given by the values in 9 for which the p-value is larger than 
I —a where a denotes the confidence level (shown as a dashed 
line for a 95% CL). The end-points of the confidence interval 
at 95% CL are indicated by the vertical arrows. 



Fig. 8: The histogram shows the distribution of —2\nq for 
pseudo experiments produced for 9 = 0.02. The vertical arrow 
indicates the value of —llnq for the observation. The p-value 
is equal to the fraction of pseudo experiments which fall above 
this value, i.e. inside the grey area. 


is the number of pseudo experiments for which the value of 
—2\nq is larger than it is for the observation, while Aftotai is the 
total number of pseudo experiments performed for this value 
of 0. 

In figure]^ the solid black curve shows the p-value as func¬ 
tion of 0 when the SM expectation is used as the observation. 
The vertical arrows indicate the values of 0 for which the p- 
value is 5% and thus determine the confidence interval at 95% 
CL. 

In order to illustrate the p-value method in more detail, fig¬ 
ure!^ shows the distribution of —21n^toy for a specific value 
of fk The vertical arrow indicates the corresponding value for 
the observation, —21ng'obs, and the grey-shaded area represents 
the pseudo experiments which have —21n^toy > —21n^obs- The 
ratio between the number of pseudo experiments in the grey- 
shaded area and all pseudo experiments gives the p-value for 
this 0. 

It should be noted that if the distribution of —21nq'toy fol¬ 
lows a chi-square distribution with one free parameter for all 
0, the p-value and delta likelihood methods produce identical 
confidence intervals. This will be examined in more detail in 
section |5] where it will be shown that while the distribution of 
—21n^ indeed does follow a chi-square distribution for large 
values of |0|, the same is not true for values of 0 around zero. 


more conservative confidence intervals in the case of a non-ob¬ 
servation where both the background-only, i.e. the SM, and the 
signal-plus-background hypotheses are disfavoured by the ob¬ 
servation. For this reason, the CL^ method is by construction 
not expected to give the correct frequentist coverage probabil¬ 
ity. 

The CLs method proceeds by calculating p-values, as de¬ 
fined in equations [T0pT| for the background-only hypothesis, 
denoted CL^, and the signal-plus-background hypothesis, de¬ 
noted CLs+h{(^)- The quantity CLs{9) is then defined as the 
ratio between the p-values for the two hypotheses. 


CL,{9) ^ 


(14) 


The confidence interval is determined by the values of 0 for 
which CLs is larger than I —a, where a denotes the confidence 
level. 

When the SM expectation is used as the observation, the 
p-value for the background-only hypothesis is exactly one, 
CLh = 1. Consequently, the quantities denoted p(0) and 
CLs{9) in equations 11 and 14 respectively, are the same and 
thus the p-value and CLs methods are identical. Section [^in¬ 
vestigates scenarios where this is not the case. 


3.4 CLs method 

The CLs method ifT^ was developed during the running of the 
Large Electron-Positron (LEP) collider and has been used both 
at LEP and at the LHC to report confidence intervals in res¬ 
onance searches, e.g. KniiiMll, and parameters in effective 
theories, e.g. Il20l . It is motivated by the attempt to provide 


4 The Baur Set 

As the problem under study arises for experimental outcomes 
not described by the model for any value of the parameter, we 
seek a procedure to define pseudo dataset in this region that 
reflects the parameter dependence in the allowed region. 

This may be achieved in many ways, here we choose a map¬ 
ping related to the statistical sensitivity in the allowed region. 




















K. D. Gregersen, J. B. Hansen: Frequentist limit setting in effective field theories 


7 



Fig. 9: Baur distributions for a subset of values of the Baur pa¬ 
rameter, r G {0,±1, ±1.5}. The distributions are constructed 
with the value of determined in section 


3.1 


Fig. 10: The curves show —21nq for Baur distributions with 
r G {0,±1,±1.5} being used in turn as the observation. The 
boxes in the lower part indicate the corresponding confidence 
intervals at 95% CL as determined by the delta likelihood ratio. 


in such a way that the migration between the two regions is 
exclusively tied to a single mapping parameter. The procedure 
works in any dimensionality and ensures well defined datasets 
for all values of the parameter. In short, this is achieved by 
scaling the SM distribution with the ratio between the SM and 
a distribution in the allowed region. The choice on parameter 
value for the distribution in the allowed region is done in terms 
of the statistical precision of the SM distribution. The resulting 
set of distributions are called the Baur sefl 

The Baur set consists of Baur distributions which are uni¬ 
quely defined by their value of the Baur parameter, r, which is 
a real number. The Baur distributions are constructed by first 
deriving the confidence interval [0iowj ^high] at 95% CL as de¬ 
termined by the delta likelihood ratio using an observation at 
the SM expectation, and defining the quantity as 


af = (0high-0iow)/2. (15) 

For a Gaussian likelihood function, this is simply two standard 
deviation^ 

Given this measure, the full Baur set is then defined as the 
infinite set of Baur distributions, B(x; r), given by 


B{x-,r) 


h{x; 0) 


/i(x; ra^^) 

H{x,dx;0) 


r >0 
r < 0 


(16) 


^ Named after late Ulrich Baur in recognition of his tremendous 
contribution to the field of diboson physics. 

* It should be noted that the choice of confidence level and statisti¬ 
cal method for computing CTg®* * is arbitrary, however, to keep it consis¬ 
tent with the choice of confidence level used in other sections, a 95% 
CL is also used here, and the delta likelihood ratio is used in order to 
keep the definition as simple as possible from a computational point 
of view. 


where h{x-,9) is the distribution of the observable x for a given 
9 and //(x,dx, 0) is the cumulative distribution of h{x',9) in the 
small interval dx, 

/ x+dx 

/i(x';0)dx', (17) 

subject to the requirement //(x,dx; 0) > 0. 

For the binned observable used here, these definitions trans¬ 
late into: 

• B(x;r) —>• B,'(r) 

• h{x', 9)—>■ hi{9) 

• //(x,dx;0) —>■ hi{9)dxi 

where i denotes the bin number, hi{9) is the event yield which 
is given by /f,(0) in equation]^ and dx, is the bin width. 

When the linear term in the signal prediction is set to zero 
for all X, the Baur set has a very straight forward interpretation. 
For Baur distributions with r > 0, the event yield, B{x\r), is 
greater than or equal to the SM expectation for all x, which 
means that these distributions are in the region described by 
the model. However, for Baur distributions with r < 0, the event 
yield is lower than the SM expectation for all x, and hence these 
Baur distributions are in the region which is not described by 
the model. 

When allowing a non-zero linear term in the signal predic¬ 
tion, the value of the lower bound on the signal prediction is 
lower than the SM expectation and thus the interpretation of 
the Baur set is different. The lower bound still persists, but it is 
shifted away from 0=0 and is in general at different values of 
0 for different x. 
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Baur parameter (r) 


Fig. 11: Comparison of the four statistical methods for comput¬ 
ing confidence intervals (Cl). The confidence intervals at 95% 
CL are shown as function of the Baur parameter, r, in the range 
r= [-1.5,1.5]. 


Figurej^shows Baur distributions for the binned observable 
for r S {0, ± 1, ± 1.5}, with the value of b eing determined 
by the confidence interval given in sectionpTI 


5 Confidence intervals for the Baur set 

This section compares the conhdence intervals produced by the 
four different statistical methods introduced in section|3]when a 
subset of Baur distributions are used in turn as the observation. 
The Baur distributions are made from the binned observable, 
and the linear term in the signal prediction is zero for all bins. 

In order to illustrate the procedure, hgure [T^ shows the 
confidence intervals as determined by the delta likelihood ratio 
when the Baur distributions with r G {0,±1,±1.5} are used 
in turn as the observation. The solid curves in the upper part 
in figure [T0| shows —21n^ for each of the Baur distributions. 
The intersections between the curves and the dashed horizontal 
line at 3.84 give the conhdence intervals at 95% CL which are 
shown in the lower part of hgure[^in corresponding colours. 

It is seen that the conhdence intervals for the largest values 
of the Baur parameter (r = 1 and r = 1.5) consist of two disjoint 
intervals due to the corresponding —21n^ curves having two 
distinct minima. The two minima originate from the quadratic 
dependence on 9 in the signal prediction. The reason they are 
symmetric around 0=0 and have equal depth is that the linear 
term in the signal prediction is zero for all bins. For r < 0, 
there is only one minimum, 0=0. For these values of the Baur 
parameter, it is seen that the —llnq curves become narrower 
as r decreases, effectively decreasing the size of the conhdence 
intervals. 


Similarly, conhdence intervals as function of the Baur pa¬ 
rameter can be computed for the other methods. The compar¬ 


ison between all methods is given in hgure 11 which shows 


the conhdence intervals when Baur distributions for values of 
r in the range r = [—1.5,1.5] are used in turn as the observa¬ 
tion. A number of differences between the methods are clearly 
seen and these will serve as the basis for the discussion in the 
remainder of this section. 

The hrst and main difference to be addressed is that the 
conhdence intervals from the conhdence belt remain constant 
for negative r while the alternative methods give conhdence 
intervals which are smaller as r decreases. 

The intervals from the conhdence belt remain constant be¬ 
cause the maximum likelihood estimators are the same for all 
values of the Baur parameter below zero, namely 0 = 0, as also 
indicated in the upper part of hgure [T0| Therefore it is the same 
intersection with the conhdence belt, i.e. at 0 = 0, which gives 
the conhdence intervals for r < 0. 

The alternative methods produce smaller intervals because 
—2\nq becomes narrower as r decreases below zero as shown 
in the upper part of hgure Evidently, this is also correlated 
with an increasing disagreement between the observation and 
the best ht. Since such a disagreement is described in terms 
of the goodness-of-ht, it indicates that the goodness-of-ht is 
encoded in the shape of the likelihood function for r < 0. 

In order to support this statement more quantitatively, the 
shape of —2\viq for Baur distributions with r < 0 is examined 
by considering the simplihed case where only the total number 
of events is used to estimate the parameter 0, i.e. focusing on 
a single-bin observable. In this case, the likelihood is given by 
the Poisson probability of observing n events with an expecta¬ 
tion of /r(0), where /r(0) depends quadratically on 0, 


n\ 


(18) 


In order to examine how the shape of —2\nq changes as 
function of n, or equivalently as function of r, the quantity 
R{n-,d) is dehned, for a given 0, as the difference in —2\\\q 
for observing n and nsM events, respectively. 


R(n- 0) = [-2\nq{n- 0)] - [-21n^(nsM; 0)], (19) 


where nsM refers to the expected number of events from the 
SM. The quantity R{n-,Q) effectively describes how the shape 
of —21n^ varies for different observations, n. 

Investigating the scenario where n < «sm, which corres¬ 
ponds to r < 0, and using that the SM expectation is equal 
to the value of the lower bound on the signal prediction, i.e. 
0« = 0nsM ~ 0, it can be shown that 0) is given by 

/?(n;0) = 2(nsM-n)ln ( ) , n < nsM- (20) 

V «SM / 

Due to the quadratic dependence on 0, /r(0) is greater than 
or equal to nsM for all values of 0, and consequently, /?(«; 0) 
is positive and increasing linearly with decreasing n for any 
0^0. This explains why the shape of —2\viq becomes nar¬ 
rower for decreasing r. 








K. D. Gregersen, J. B. Hansen: Frequentist limit setting in effective field theories 


9 




Fig. 12: The curves show —2\nq for observations given by 
Baur distributions with r S {0,±1}, respectively. The dashed 
line displays the 95% CL contour line as determined by the 
pseudo experiments. The intersections between the curves and 
the contour line gives the confidence intervals for the p-value 
method. 


The corresponding goodness-of-fit as function of n is de¬ 
scribed by the chi-square test statistic, 


( 7 ^ 


[n-nsuf 

«SM 


( 21 ) 


for n < risM- 

It seen that R^{n-, 9) and the chi-square are directly propor¬ 
tional to each other. 


R^(n-e)ocx\n), ( 22 ) 

which means that there is a direct link between the shape of the 
likelihood function and the goodness-of-fit for scenarios where 
fewer events are observed than what is predicted by the SM. 

Consequently, any statistical method which relies on the 
shape of the likelihood function will encode the goodness-of-fit 
measure into the confidence interval which is clearly undesire- 
able. Since the alternative methods for computing confidence 
intervals explicitly depend on the shape of the likelihood func¬ 
tion, they will provide biased intervals which, as seen in fig- 
ure[TT] over-constrain the parameter when fewer events are ob¬ 
served than what is expected from the SM. 

Another striking difference between the statistical methods 
displayed in figure is that the CLg method gives consider¬ 
ably larger intervals than the other methods for large positive 
values of the Baur parameter, which notably also do not sep¬ 
arate into two disjoint intervals. These features are due to the 
fact that —2\viq has a local maximum at 0 = 0, the value of 
which increases with increasing r (see the upper part of figure 


Fig. 13: Distributions of —21n^ for two value of the parameter, 
9—0 (solid blue line) and 0 = 0.02 (solid red line), and the 
distribution of the chi-square for one free parameter (dashed 
black line). The inset figure is a zoom-in on the lower region. 


[Tol l. Consequently, the corresponding p-values for the SM, i.e. 
CLfo, decrease and the confidence intervals grow in size and, by 
construction, never split into two. The fact that the CLs method 
gives larger intervals for these values of r is not surprising since 
the method by construction is meant to overestimate the inter¬ 
vals. 

It is also interesting that for r < 0 the confidence intervals 
produced by the CLs method are identical to those produced by 
the p-value method. Naively, one would expect the CLs method 
to expand the confidence intervals in situations where the SM 
expectation is disfavoured by the observation, as is the case for 
these Baur distributions. However, due to the lower bound on 
the signal prediction, the minimum of —21n^ is at 0 = 0 and 
hence the p-value for the SM hypothesis for an observation 
given by a Baur distribution with r < 0 is misleadingly equal 
to one, CLj, = 1. As a result, the quantities denoted p{9) and 
CLs in equations andrespectively, are the same and thus 
the p-value and CLs methods provide identical confidence in¬ 
tervals. 

We now address two more subtle differences between the 
methods which are seen in figure [TT] These will be explained 
in detail since it gives a good understanding of the basic mech¬ 
anisms at play which are important for the overall description 
of the statistical methods. 

The first is that the p-value and CLs methods provide smaller 
confidence intervals than the delta likelihood method for r < 0. 
The second is that the p-value and delta likelihood methods 
disagree on the value of r where the confidence interval breaks 
into two disjoint intervals. For the delta likelihood method this 
occurs by construction at r = 1, while the p-value method also 
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produces two disjoint intervals for values of r slightly below 
one. 


In order to examine these observations in more detail, hg- 


ure 12 shows —2\nq for observations given by three values of 
the Baur parameter, r G {0,±1}, (solid curves) superimposed 
on the 95% CL contour line as determined by the pseudo exper¬ 
iments (dashed line), i.e. the line above which 5% of the pseudo 
experiments fall for a given 0. From this hgure, the conhdence 
intervals at 95% CL for the p-value method are given by the 
intersections between the 95% CL contour line and the curves 
showing —21nq. 

It is seen that for large |0|, corresponding to the region far 
away from the bound, the 95% CL contour line agrees with 
3.84. However, for small 101 the line has a shift towards a lower 
plateau due to the boundary. When investigating the distribu¬ 
tions of —21nq for two values of 0 (see hgure[T3|l, it is seen that 
the shift is due to many pseudo experiments having —2 In g' = 0. 
The distribution of the chi-square for one free parameter is su¬ 
perimposed (dashed curve) and it shows perfect agreement with 
the distribution of —2Inq' for the large value of 101 (the red his¬ 
togram) as expected. 


The modihcation of the distribution of —21n^ in hgure [TI] 
(blue histogram), and the corresponding downward shift in the 
95% CL contour line in hgure [T^ (dashed line), occur since the 
pseudo experiments not described by the model have 0=0. 
Consequently, when scanning through values of 0 close zero, 
the value of —21n^(0) for these pseudo experiments is also 
close zero, and for 0 = 0 it is identically zero. The fraction of 
these pseudo experiments grows as 0 approaches zero at which 
point it reaches approximately one half. The lower plateau man¬ 
ifests itself when all of these pseudo experiments have migrated 
below the 95% CL contour line. For larger values of 101, where 
the pseudo experiments only rarely probe the region not de¬ 
scribed by the model, the value of —21nq is not signihcantly 
affected and thus the 95% CL contour line agrees with 3.84. 

The downward shift in the 95% CL contour line in hgure 
[T2]means that the intersections between this line and the curves 
showing —21nq occur at different values of 0 than the corre¬ 
sponding intersections between these curves and a line at 3.84. 
Consequently, the delta likelihood ratio method provides larger 
intervals for Baur distributions with r < 0 than the p-value and 
CLj methods, and the p-value method produces two disjoint 
intervals for slightly smaller values of r compared to the delta 
likelihood method. 


As a hnal remark, it should be noted that while the Baur 
distributions efficiently illustrate a number of differences be¬ 
tween the statistical methods, the situation is, in general, more 
complicated since the data does not necessarily have the same 
trend for all values of the observable. For example, a dehcit 
of events with respect to the SM expectation in a region less 
sensitive to the parameter can be compensated by a surplus in 
a more sensitive region. This aspect complicates the situation 
considerably, and in fact there is no way to know if a conh¬ 
dence interval computed with one of the alternative methods is 
biased or not without explicitly also computing it with the con¬ 
hdence belt. This is particularly interesting as it differs from 
the situation where the signal prediction only depends linearly 
on the parameter of interest. In this case, the conhdence belt 
corresponds to the acceptance region of the hypothesis test and 



Fig. 14: The curves show —21nq for different values of 
cos(A(p) in the signal prediction, and with the SM expectation 
used as the observation in all cases. The boxes in the lower part 
indicate the corresponding conhdence intervals at 95% CL as 
determined by the delta likelihood ratio. 


thus the p-value method will always give the same result as the 
conhdence belt. As demonstrated here, this is not the case when 
the signal prediction depends quadratically on the parameter of 
interest. 


6 Non-zero interference 

In the previous sections, it was assumed that the linear term 
in equationj^was absent. This section will address the general 
case with a non-zero linear term. 

For the case of effective held theories, the linear term ai (x) 
corresponds to a interference term and can be written as 

ai(x) = 2y/ao{x)a2(x)cos{A(j){x)), (23) 

where A0(x) is the phase difference between the amplitudes 
Asm(.*) and Absm(.*)- 

The unknown dependence of ai (x) is described entirely by 
the phase difference through cos(A0(x)). Since cosine is lim¬ 
ited to the range [—1,1], the size of a i (x) is less than or equaj^ 
to 2 ^/ao{x)a 2 ix). 

In order to test the effects of a non-zero linear term, the 
signal prediction is modihed using equation dropping the 
X dependence in cos(A0) for simplicity. A range of values for 
cos(A(j)) is considered. 

® Note that for the extreme case of cos{A(j)) = ±1, corresponding 
to a minimum signal prediction of exact zero, the Poisson likelihood 
is not dehned. 
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Confidence intervals (95% CL) 



Baur parameter (r) 


(a) cos{A(j>) = 0.1. 


Confidence intervals (95% CL) 

0.06 — -Delta Likelihood Ratio 

-p-value 


iM Confidence Belt 

0.04- 



Baur parameter (r) 


(b) cos{A(f)) = —0.1. 



(c) cos{A(l>) = 0.2. 


(d) cos{A(f)) = —0.2. 


Fig. 15: Confidence intervals at 95% CL as function of r in the range r = [—1.5,1.5], using four different values for cos(zi0) in 
the signal prediction. 


In order to give an idea of how the likelihood function 


is affected when the model is modihed, hgure 14 shows the 


curves for —2\nq for an observation at the SM expectation 
using seven different values of cos (40) in the signal predic¬ 
tion, cos(40) S {0, ±0.1, ±0.2, ±0.7}, corresponding to 0%, 
10%, 20% and 70% of the maximal interference. It is seen 
that as the size of the negative (positive) interference terms in¬ 
crease, there is a shift towards positive (negative) values of 0 
in the —llnq curves and that a shoulder appears on the right 
(left) hand side of the minimum. The intersections between the 
curves and the dashed horizontal line at 3.84 give the confi¬ 
dence intervals at 95% CL as determined by the delta likeli¬ 
hood method which are shown in the lower part of figure [T4| in 
corresponding colours. As the shoulder moves above the line at 
3.84, which is the case for the extreme value cos(40) = ±0.7, 


the confidence intervals get smaller and becomes increasingly 
symmetric around 9—0. This reflects the fact that the linear 
term in the signal prediction begins to dominate. 

The comparison between the statistical methods for dif¬ 
ferent observations are done using Baur sets constructed for 
four for different values of cos( 40) in th e signal prediction, 
cos(40) G (±0.1, ±0.2}. Figures 15a|[l5d show the confidence 
intervals when the observation is given by the Baur distribu¬ 
tions for values of r in the range r = [—15,1.5] for each of 
the four Baur sets, respectively. 

For all four Baur sets, clear trends for the statistical meth¬ 
ods are observed. First, it should be mentioned that the qualita¬ 
tive differences between the graphs for positive versus negative 
cos (40) are due to the specihc choice of the sign of ffref in 
equation [^ If changing the sign of Cref, the features are re- 
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versed between positive and negative cos(40). For instance, it 
is seen that for positive values of cos (40) the sizes of the con¬ 
fidence intervals are strictly increasing with r (until the point 
where they break into two disjoint intervals), whereas for neg¬ 
ative values of cos (40) there is an intermediate range in r 
around r = 0 where the sizes of the confidence intervals de- 
screase with r. This is directly related to the sign of CJref and 
the effect would be reversed if the sign was changed. 


Addressing the differences between the methods, it is seen 
in figure [TS^ that the otherwise defining feature of having two 
disjoint confidence intervals for large values of the Baur pa¬ 
rameter does not apply to the delta likelihood and the p-value 
methods. The reason is that for a combination of sufficiently 
large cos (40) and r, there is sensitivity to the sign of the pa¬ 
rameter. More specifically, it means that the two minima in 
—21n^ for the Baur distributions at large r are separated to 
an extend which makes the non-global minimum lie above the 
threshold for a 95% CL. Thus, only one confidence interval is 
produced, and this will always be the upper one in the figure 
due to the way the Baur distributions are defined. Hints of this 
trend can also be seen in figures 15a[ 15b and 15d where the 
lower intervals produced by delta likelihood and p-value meth¬ 
ods are slightly smaller than the corresponding intervals given 
by the confidence belt for large values of r. As is seen in figure 
|15c| the two methods do not agree exactly on where the transi¬ 
tion region for producing one or two intervals is, only that it is 
around r = 1. This arise since the distribution of —2\viq does 
not exactly follow a chi-square distribution for all 0. 


In contrast, it is seen that the confidence belt method for all 
four values of cos (40) produces two disjoint confidence inter¬ 
vals for large r. The reason is that the cross-like shape of the 
confidence belt persists for all four values of cos(40). How¬ 
ever, it should be mentioned that the density of pseudo exper¬ 
iments is different in the two diagonal branches in the confi¬ 
dence belt when cos (40) ^ 0. The branch with the negative 
slope in figurej^has a much lower fraction of the pseudo exper¬ 
iments, the trend being that the density decreases with increas¬ 
ing |cos(40)|. In fact, for high enough values of |cos(40)|, 
the branch with negative slope in the confidence belt will dis¬ 
appear, at which point the confidence belt only produces one 
confidence interval. 


The discrepancy between the delta likelihood, p-value and 
the confidence belt methods is interesting since it implies that 
the former two do not manage to fully map the relation between 
the parameter of interest and its maximum likelihood estimator. 
This is best understood by considering the level of information 
used by the p-value method when a pseudo experiment is per¬ 
formed for a given 0. As explained in section [33] the p-value 
method counts the number of pseudo experiments where the 
value of —2\viq is larger than it is for the observation. However, 
only using the value of —2 In ^ does not encapsulate the fact that 
there are potentially two minima in —2 In g' for each pseudo ex¬ 
periment, and that the global minimum fluctuates between the 
two from one pseudo experiment to the next. Consequently, the 
p-value and delta likelihood methods over-constrain the param¬ 
eter. 


Another interesting feature is that for negative values of the 
Baur parameter, the CLs method expands the confidence in¬ 
tervals compared to the p-value method. This effect becomes 


more distinct as |cos(40)| increases. The reason is that the 
minimum in the signal prediction is not at the SM value, 0=0, 
but rather shifted towards positive (negative) values of 0 for 
negative (positive) values of cos(40). Consequently, the p- 
value for the SM, CLh, is less than one and the confidence inter¬ 
val gets expandend compared to the interval from the p-value 
method. 

Finally, as seen in figure |15c[ the CL^ method for 
cos(40) = 0.2 produces two separated intervals for large val¬ 
ues of r. In order to understand this, it should first be recalled 
that —21n^ has a local maximum for large values r (see e.g. 
figure 10 red and yellow graphs). For |cos(40)| above a cer¬ 
tain value, the difference between the p-values at the SM, CLt, 
and at the local maximum, CLi+fo(0max), becomes so large that 
a region around Omax is not included in the confidence interval, 
i.e. CLi(0max) < I — CC, where a denotes the confidence level. 
Consequently, this gives two disjoint confidence intervals on 
each side of 0max- 


7 Conclusion 

The effective Lagrangians approach used in most model inde¬ 
pendent searches for BSM physics introduces a bound on the 
signal prediction due to a quadratic parameter dependence in 
the differential cross section. The bound is typically a lower 
bound due to the non-renormalisability of the BSM terms and 
is often located close to or at the SM expectation for physics 
cases such as anomalous Higgs couplings, anomalous trilinear 
or quartic gauge couplings. 

While the original frequentist approach for determining con¬ 
fidence intervals, known as the confidence belt, explictly com¬ 
putes the mapping of the observation in data into a subset of 
values for the true parameter, thus giving the correct frequentist 
coverage for all observational scenarios, it is demonstrated that 
statistical methods currently employed at the LHC, i.e. the delta 
likelihood, the p-value and the CL^ methods, systematically 
over-constrain the parameter when data shows distinct fluctua¬ 
tions into the region which is not described by the model. 

The presence of a interference term between the SM and 
the BSM amplitudes improves the ability of the model to de¬ 
scribe data in the vicinity of the SM. However, it also shows 
that the delta likelihood, the p-value and the CLs methods in 
general fail to map the observation in data into the full subset 
of values for the parameter, even for observations which are 
fully described by the model. Consequently, the experimental 
sensitivity to interference terms depends on statistical proce¬ 
dures. 

It should be emphasized that the present findings show that 
the usual correspondance between the confidence belt and the 
hypothesis test performed in the p-value method, i.e. that the 
former constitutes the acceptance region of the latter, is not true 
for the case where the parameter of interest enters quadratically 
in the signal prediction. In fact, this statement is true for any 
functional dependency on the parameter which introduces a re¬ 
gion not described by the model. For physics scenarios where 
this is the case, the delta likelihood, the p-value and the CLs 
methods are not guaranteed to provide the correct frequentist 
coverage. 
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